Pipelining Computation and Optimization Strategies for Scaling GROMACS on the Sunway Many-Core Processor
نویسندگان
چکیده
The increasing gap between plentiful computing elements and limited memory bandwidth makes it increasingly difficult and sometimes even infeasible for HPC community to port more applications onto many-core processor archi‐ tectures. The Sunway many-core processor SW26010 used to build the Sunway TaihuLight System contains a total of 260 heterogeneous cores. All these cores can be divided into 4 core groups (CGs). Each CG includes a Management Processing Element (MPE) core and 64 Computing Processing Elements (CPEs) cores. In this paper, we refactor an important molecular dynamics (MD) appli‐ cation GROMACS on the Sunway Taihulight system. By rewriting the computeintensive kernel of GROMACS, we exploit a suitable parallelism for CPE cluster and implement pipelining computation between MPE and CPE cluster. Optimi‐ zation strategies including the efficient use of scratchpad, the software-emulated cache and a hybrid parallel algorithm are adopted to solve the challenging memory bandwidth limitation. When comparing the refactored version using MPE and 64 CPEs with the original ported version using only MPE, we achieve a 16x speedup for the compute-intensive kernel. For simulating a molecule with 3 million atoms, we currently have managed to scale to 798,720 cores. Moreover, we analyze the adaptability of our mapping and optimization strategies for solving the memory bandwidth limitation when refactoring a real-world application on the Sunway heterogeneous many-core processor system.
منابع مشابه
Ultra-Low-Energy DSP Processor Design for Many-Core Parallel Applications
Background and Objectives: Digital signal processors are widely used in energy constrained applications in which battery lifetime is a critical concern. Accordingly, designing ultra-low-energy processors is a major concern. In this work and in the first step, we propose a sub-threshold DSP processor. Methods: As our baseline architecture, we use a modified version of an existing ultra-low-power...
متن کاملScalability study of molecular dynamics simulation on Godson-T many-core architecture
Molecular dynamics (MD) simulation has broad applications, and an increasing amount of computing power is needed to satisfy the large scale of the real world simulation. The advent of the many-core paradigm brings unprecedented computing power, but it remains a great challenge to harvest the computing power due to MD’s irregular memory-access pattern. To address this challenge, this paper prese...
متن کاملA Multi-Core Pipelined Architecture for Parallel Computing
Parallel programming on multi-core processors has become the industry’s biggest software challenge. This paper proposes a novel parallel architecture for executing sequential programs using multi-core pipelining based on program slicing by a new memory/cache dynamic management technology. The new architecture is very suitable for processing large geospatial data in parallel without parallel pro...
متن کاملA Performance/Energy Analysis and Optimization of Multi-Core Architectures with Voltage Scaling Techniques
In this paper, we develop asymptotic analysis and simulation models to better understand the characteristics of performance and energy consumption in a multi-core processor design in which dynamic voltage scaling is used. Our asymptotic model is derived using Amdahl’s law, Rent’s rule and power equations to derive the optimum number of cores and their voltage levels. Our model can predict the p...
متن کاملCombining Coarse-Grained Software Pipelining with DVS for Scheduling Real-Time Periodic Dependent Tasks on Multi-Core Embedded Systems
In this paper, we combine coarse-grained software pipelining with DVS (Dynamic Voltage/Frequency Scaling) for optimizing energy consumption of streambased multimedia applications on multi-core embedded systems. By exploiting the potential of multi-core architecture and the characteristic of streaming applications, we propose a two-phase approach to solve the energy minimization problem for peri...
متن کامل